Attempt to fix Future L1 Track SW + bugs in HLS memory classes. #347

tomalin · 2024-09-13T17:25:15Z

The Future L1Trk CMSSW SW has not yet migrated to the 2FPGA solution. I'm therefore trying to bodge it so it continues to work (for now) with the 1FPGA solution.

The three functions in TrackBuilder_parameters.h with names getMPAR*() caused the CMSSW compiler to fail because of "duplicate function definitions". I fixed this by declaring them to be "constexpr" or "inline" in generate_TB.py. I also (on style grounds) changed their template parameter from type int to TF::seed.
MemoryTemplateBinnedCM::clear() used constant NCOPY, which caused array-out-of-bounds errors in CMSSW, where NCOPY can be zero. Replaced NCOPY with NCP.
CMSSW compiler errors about "duplicate code", solved by deleting the calculation of nentries withing CMSSW_GIT_HASH that was in MemoryTemplateBinnedCM & MemoryTemplateBinned & MemoryTemplatePROJ.
Added constants DEPTH_BX and DEPTH_ADDR to MemoryTemplate.h, MemoryTemplateBinnedCM.h and MemoryTemplateTPROJ.h, to simplify them. In MemoryTemplateTPROJ.h, replaced magic number 128 by DEPTH_ADDR.
Modified MemoryTemplateTPROJ to it instantiates only one page of memory when used inside CMSSW.
Replaced argument addr_index of MemoryTemplate::write_mem_new() by argument "ap_uint<1> overwrite", with the same number of bits used by MatchProcessor.h HERE when it calls write_mem_new().
Howver, this code inside write_mem_new(), which should protect the memory against having too many entries does nothing, since addr_index is a 1 bit number. This is a BUG. I replaced it by a check that nentries_[ibx] is than than (1<<NBIT_ADDR). However, array access can be slow in FW, so one should check if this causes FW timing errors.
Besides the style issue that it's hard-wiring the number of pages to 4, https://github.com/cms-L1TK/firmware-hls/blob/master/TestBenches/FileReadUtility.h#L259 seems to incorrectly assume that MemoryTemplateTPROJ::getDepth() returns the total depth of the memory, whereas it can be seen from https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/MemoryTemplateTPROJ.h#L49 that it returns a number that is a factor NPAGE less than this. I believe this is a BUG and that compareMemWithMemPage() is only checking one quarter of the memory. Fixed.
SWLUTReader.h was writing by 1 unit beyond the array bounds of the input "lut" array, because it neglected that the input .tab files contain a final line with a "close bracket" but no number. This BUG had horrible consequences, causing all LUT contents to become zero in CMSSW. Now fixed.
Modified download.sh, so that when running the CombinedCM chain, the TP takes its LUT tables from LUTsSplit/ instead of LUTsCM/ . This reduces the rate of memory errors, as LUTsCM/ is out-of-date. All other modules in the chain already used LUTsSplit/.
To improve clarity of VMRouterCM.h & MatchProcessor.h, renamed nAllInnerCopies to nAllInnerVariants, since the versions of the AllStubsInner memories that are produced are not copies of each other, but are different from each other. Similarly, renamed maxFullMatchCopies to maxFullMatchVariants.

THIS PR NEEDS DISCUSSION WITH @aehart and @aryd BEFORE MERGING.

tomalin · 2024-09-13T17:35:06Z

Strange. This fails git CI because of 2FA authentification issues https://github.com/cms-L1TK/firmware-hls/actions/runs/10853460640/job/30121761689?pr=347 . @aehart the L1Trk CMSSW code in our github cms-L1TK area runs CI in gitlab, and gitlab has a token (which I created) https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/settings/access_tokens so that it can access our github repo. Our github repo stores this at https://github.com/cms-L1TK/cmssw/settings/secrets/actions . The firmware-hls cms-L1TK github repo has no such tokens https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/settings/access_tokens . But I'd have predicted that this would could 2FA failures for the HLS since Oct. 2023, not just now.

tomalin · 2024-09-17T12:53:38Z

QUESTIONS/ISSUES:

With this PR, the L1Trk Future SW runs in CMSSW14, but reports huge numbers of memory content inconsistencies in the TP, MP & TB. Although we use the CombinedCM wiring map, we use the memory content .txt files & LUT tables from the FPGASplit/ solution, as they are more up-to-date, and reduce the error count. The errors seen in the TP are due to expected the Tracklets all being found, but stored in a different order in the TPAR memories to in the reference files.
Anders's branch of the old CMSSW emulation no longer writes the 1-FPGA solution memories to the .txt files. This will need fixing if the Future SW is not to see more memory errors.
- ANDERS suggests to remake the txt files, after modifying TrackletProjectionMemories::writeTPROJ() in the old emulation code, such that e.g. MPROJ_L1L2ABC_D1PHIA pages 1-3 are is split into separate files for page (1,2,3) TPROJ_L1L2(A,B,C)_D1PHIA etc.
The TrackletProcessor writes TrackletParameterMemory objects that inherit from MemoryTemplateTPROJ, and so have 4 pages of memory, despite the fact that the TrackletProcessor only writes to the first page. Is this wasting memory?
- ANDERS SAYS: They use 1 BRAM, so waste of memory doesn't matter.
Why is MemoryTemplate::write_mem_new() only called for the FullMatchMemory (from MatchProcessor.h), when several other memory types are based on MemoryTemplate. Is this function is an improvement on write_mem(), why not use it for all memory types?
- ANDERS THINKS write_mem_new() should probably be used everywhere. But needs checking.
MemoryTemplateBinnedCM::write_mem(ibx, slot, data, nentry_ibx) has two checks HERE and HERE that the number of entries is not too big to overflow. We don't need to cut twice, so this is a BUG. The first one cuts on nentry_ibx, which is a historic variable, no longer incremented inside the memory when new entries are added. The second cuts on nentry, which is the new variable, but is only applied when not in synthesis, whereas the first cut is also applied when in synthesis. I'M UNSURE HOW TO FIX THIS.
MemoryTemplate::write_mem() increments the number of entries twice, once HERE and once HERE. Is this a BUG?
Several data members (e.g. ireg__ and ireg___ and innerstub__ and innerstub___ ) are not initialized inside the constructor of TrackletEngineUnit and perhaps they should be set also in TrackletEngineUnit::reset()? I suspect they are first used by TrackletProcessor before they are initialized, which leads to unpredictable results in CMSSW. (If I initialize all these to zero in TrackletEngineUnit, and modify the default constructors of AllStubInnerMemory and VMStubTEOuterMemoryCM, so that they set data_ = 0), I see small changes in Future SW tracking performance in CMSSW 13). Is this a BUG?
All HLS modules should use SWLUTReader() to access their LUT tables to reduce compilation memory use, but only the TrackletProcessor does. There is also no consistency in which directories the LUT tables are read from. The IR & TP read them from LutsSplit/ whereas the other modules read them from the symbolic links such as VMRCM/ . These are equivalent, but we should be consistent.
I can think of no good reason for defining 5 identical typedef's https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/TrackletParameterMemory.h#L161 . Is this a mistake?

…umber

tomalin · 2024-11-18T16:02:05Z

Rebased to latest master

TestBenches/FileReadUtility.h

aehart · 2024-11-19T11:37:46Z

TrackletAlgorithm/MemoryTemplateBinnedCM.h

-	nentries_[ibx*kNBinsRZ+ibin].range(ireg*4+3,ireg*4)=nentry+1;
-	if (ibin!=0) {
-	  nentries_[ibx*kNBinsRZ+ibin-1].range((ireg+8)*4+3,(ireg+8)*4)=nentry+1;
-	}
-	binmask8_[ibx][ibin].set_bit(ireg,true);
+    	nentries_[ibx*kNBinsRZ+ibin].range(ireg*4+3,ireg*4)=nentry+1;
+    	if (ibin!=0) {
+	      nentries_[ibx*kNBinsRZ+ibin-1].range((ireg+8)*4+3,(ireg+8)*4)=nentry+1;
+    	}
+	    binmask8_[ibx][ibin].set_bit(ireg,true);

-	//icopy comparison must be signed int or future SW fails
-        writememloop:for (signed int icopy=0;icopy< (signed) NCP;icopy++) {
+    	//icopy comparison must be signed int or future SW fails
+      writememloop:for (signed int icopy=0;icopy< (signed) NCP;icopy++) {
 #pragma HLS unroll
-	  dataarray_[icopy][ibx][getNEntryPerBin()*slot+nentry] = data;
-	}
+	      dataarray_[icopy][ibx][getNEntryPerBin()*slot+nentry] = data;
+    	}


@tomalin These changes just seem to affect the indentation, which was already a bit off, and now looks even more off, at least in my text editor. Could you clean this up, perhaps removing tabs altogether and consistently using two spaces to indent lines inside brackets?

The problem is that the spaces in the HLS code are often created with "tab", and the number of spaces that "tab" corresponds to is not well-defined. It differs on different computers and in different editors/viewing tools. To fix this, we'd really have to globally replace all tabs by spaces, and then run clang to enforce a sensible indentation. And ask people to stop using tabs when editing the HLS code. But this should wait until there are no outstanding PRs.

I completely agree, but I also think we can clean things up as we change other things. Especially with this block of code in particular, where the only changes that have been made are to the indentation, unless I've missed something.

I can do this, but it would mean replacing all tabs with spaces in every line of every file where you've queried my changes to the indentation. And this of course would make it harder during your PR review to spot what I've changed. I suggest I make this change at the end, after I've answered all your non-indentation related comments?
P.S. There are other changes in this file, such as the introduction of "static constexpr unsigned int DEPTH_BX = 1<<NBIT_BX;".

OK, these indentation comments are the only ones left. The block highlighted here is the most significant example, since the only changes you made, AFAICT, are to the indentation. Some of those changes also don't make sense, like lines 156 and 160 being in different columns now.

I included suggestions for the other blocks I highlighted that you can simply accept if you like, so I hope these comments are fairly trivial to address.

aehart · 2024-11-19T11:37:53Z

TrackletAlgorithm/MemoryTemplateBinnedCM.h

+      for (size_t icopy=0; icopy < NCP; icopy++) {
+       	for (size_t ibin=0; ibin < kNMemDepth; ibin++) {
+	        dataarray_[icopy][ibx][ibin] = data;
+      	}


@tomalin Same comment regarding indentation.

Suggested change

for (size_t icopy=0; icopy < NCP; icopy++) {

for (size_t ibin=0; ibin < kNMemDepth; ibin++) {

dataarray_[icopy][ibx][ibin] = data;

}

for (size_t icopy=0; icopy < NCP; icopy++) {

for (size_t ibin=0; ibin < kNMemDepth; ibin++) {

dataarray_[icopy][ibx][ibin] = data;

}

aehart · 2024-11-19T11:38:08Z

TrackletAlgorithm/MemoryTemplateTPROJ.h

+    //assert(page < NPAGE);    
+  	// TODO: check if valid  
+	  if(!NBIT_BX) ibx = 0;
+ 	  return dataarray_[ibx][DEPTH_ADDR*page+index];


@tomalin Same comment regarding indentation.

Suggested change

//assert(page < NPAGE);

// TODO: check if valid

if(!NBIT_BX) ibx = 0;

return dataarray_[ibx][DEPTH_ADDR*page+index];

//assert(page < NPAGE);

// TODO: check if valid

if(!NBIT_BX) ibx = 0;

return dataarray_[ibx][DEPTH_ADDR*page+index];

aehart · 2024-11-19T11:38:14Z

TrackletAlgorithm/MemoryTemplateTPROJ.h

+      	nentries_[ibx*NPAGE+page] = 0;
+      	for (size_t addr=0; addr<DEPTH_ADDR; ++addr) {
+	        dataarray_[ibx][DEPTH_ADDR*page+addr] = data;
+      	}


@tomalin Same comment regarding indentation.

Suggested change

nentries_[ibx*NPAGE+page] = 0;

for (size_t addr=0; addr<DEPTH_ADDR; ++addr) {

dataarray_[ibx][DEPTH_ADDR*page+addr] = data;

}

nentries_[ibx*NPAGE+page] = 0;

for (size_t addr=0; addr<DEPTH_ADDR; ++addr) {

dataarray_[ibx][DEPTH_ADDR*page+addr] = data;

}

aehart · 2024-11-19T11:38:21Z

TrackletAlgorithm/MemoryTemplateTPROJ.h

@@ -209,7 +210,7 @@ class MemoryTemplateTPROJ

  void print_entry(BunchXingT bx, NEntryT index, unsigned int page = 0) const
  {
-	print_data(dataarray_[bx][(1<<NBIT_ADDR)*page+index]);
+  	print_data(dataarray_[bx][DEPTH_ADDR*page+index]);


@tomalin Same comment regarding indentation.

Suggested change

print_data(dataarray_[bx][DEPTH_ADDR*page+index]);

print_data(dataarray_[bx][DEPTH_ADDR*page+index]);

I've replaced all tabs by spaces in MemoryTemplate.h MemoryTemplateBinnedCM.h & MemoryTemplateTPROJ.h , and then adjusted the number of spaces, such that it increases by 2, whenever, there is an additional pair of curly brackets.

emData/generate_TB.py

aehart

These changes look good to me. I'll go ahead and merge now, and we should discuss the remaining open questions, perhaps in the next TF FW meeting.

tomalin requested review from aryd and aehart September 18, 2024 08:22

tomalin changed the title ~~Attempt to fix Future L1 Track SW (DO NOT MERGE YET)~~ Attempt to fix Future L1 Track SW Sep 23, 2024

tomalin marked this pull request as ready for review September 24, 2024 08:50

tomalin changed the title ~~Attempt to fix Future L1 Track SW~~ Attempt to fix Future L1 Track SW + bugs in HLS memory classes. Sep 24, 2024

tomalin added 18 commits November 18, 2024 15:47

Fix compile errors for templated functions

2d73c03

Fixed clashes with Future SW code

67b785f

fix to earlier commit

4e8034c

Modify write_mem so it can handle old txt files that dont have page n…

f2af504

…umber

Fix bug that only one quarter of memory being checked

e04b77b

Fix writing outside array bug

54ec7db

tweak

680159c

Optimised CPU use

c4c8ac3

fix compilation issue

d229b00

tweak

34d1e99

Fix bug in this PR

705052c

Fixed bug in PR

a3255b7

Fixed bug in PR

9f4b729

Fix bug in PR

9d95849

Switch to MemPrintSplit input files for Future SW

4beb80b

Changed TP tp used LutsSplit with CombinedCM chain

a4b91c1

Give variable better name

d8ed714

Give variable better name

bdac786

tomalin force-pushed the ianFixFutureSW branch from 5f7f389 to bdac786 Compare November 18, 2024 16:01

aehart reviewed Nov 19, 2024

View reviewed changes

TestBenches/FileReadUtility.h Show resolved Hide resolved

aehart reviewed Nov 19, 2024

View reviewed changes

emData/generate_TB.py Show resolved Hide resolved

Replace tabs by spaces

83b5bec

aehart approved these changes Dec 4, 2024

View reviewed changes

aehart merged commit 5f3bd55 into master Dec 4, 2024
1 check passed

aehart deleted the ianFixFutureSW branch December 4, 2024 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to fix Future L1 Track SW + bugs in HLS memory classes. #347

Attempt to fix Future L1 Track SW + bugs in HLS memory classes. #347

tomalin commented Sep 13, 2024 •

edited

Loading

tomalin commented Sep 13, 2024 •

edited

Loading

tomalin commented Sep 17, 2024 •

edited

Loading

tomalin commented Nov 18, 2024

aehart Nov 19, 2024

tomalin Nov 19, 2024

aehart Nov 19, 2024

tomalin Nov 22, 2024 •

edited

Loading

aehart Dec 3, 2024

aehart Nov 19, 2024

aehart Nov 19, 2024

aehart Nov 19, 2024

aehart Nov 19, 2024

tomalin Dec 3, 2024

aehart left a comment

	print_data(dataarray_[bx][DEPTH_ADDR*page+index]);
	print_data(dataarray_[bx][DEPTH_ADDR*page+index]);

Attempt to fix Future L1 Track SW + bugs in HLS memory classes. #347

Attempt to fix Future L1 Track SW + bugs in HLS memory classes. #347

Conversation

tomalin commented Sep 13, 2024 • edited Loading

tomalin commented Sep 13, 2024 • edited Loading

tomalin commented Sep 17, 2024 • edited Loading

tomalin commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomalin Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aehart left a comment

Choose a reason for hiding this comment

tomalin commented Sep 13, 2024 •

edited

Loading

tomalin commented Sep 13, 2024 •

edited

Loading

tomalin commented Sep 17, 2024 •

edited

Loading

tomalin Nov 22, 2024 •

edited

Loading